Use this template to complete your project throughout the course. Your Final Project presentation in class will be based on the contents of this document. Replace the title/name and text below with your own, but leave the headers.
In this section, give a brief a description of your project and its goal, what data you are using to complete it, and what three faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.
The goal of this project is to evaluate whether features of placenta morphology during the first trimester of pregnancy can be used to predict whether a baby is “small for gestational age (SGA)”, meaning that they are born within the 10th percentile of fetal birth weight.
In the first paragraph, describe the problem addressed, its significance, and some background to motivate the problem.
In the second paragraph, explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff.
In the first paragraph, describe the data used and general methodological approach. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.
First, the read the data.
| Variable Name | Description |
|---|---|
| model_id | model ID number |
| study_id | study ID number |
| race | race (7,44,1,2,3) |
| wtscrn | |
| height | maternal height (in) |
| sbpscrn | |
| crl | crown rump length (mm) |
| pappamom | |
| gadel | |
| fetal_sex | fetal sex (0 = ?, 1 = ?) |
| birthwt | fetal birthweight (g) |
| maternal_age_US1 | maternal age at first ultrasound exam (years) |
| gest_age_US1 | gestational age at first ultrasound exam (days) |
| sga_5th | |
| sga_10th |
library(readxl)
library(tidyverse)## ── Attaching packages ────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## âś” ggplot2 3.0.0 âś” purrr 0.2.5
## âś” tibble 1.4.2 âś” dplyr 0.7.6
## âś” tidyr 0.8.1 âś” stringr 1.3.1
## âś” readr 1.1.1 âś” forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
setwd("/Users/alison/BMIN503_placenta")
# Read in the entire clinical sheet
df.clinical=read_xlsx("/Users/alison/Desktop/BMIN_503/final_project/placenta_subject_data.xlsx")
# Select variables wanted
df.clinical.vars <- df.clinical %>%
select(model_id = "Model number",
study_id = "Study ID",
race = "RACE",
wtscrn = "WTSCRN (kg)",
height = "HT (in)",
sbpscrn = "SBPSCRN",
crl = "CRL (mm)",
pappamom = "PAPPAMOM",
gadel = "GADEL",
fetal_sex = "FETSEX1",
birthwt = "BIRTHWT (g)",
maternal_age_US1 = "Maternal Age at US1",
gest_age_US1 = "GA US1",
sga_5th = "SGA<5th%",
sga_10th = "SGA<10th%")
# Convert some columns to numeric
variables.numeric <- c("race","wtscrn","height","sbpscrn","crl","pappamom","gadel","fetal_sex","birthwt","maternal_age_US1","gest_age_US1","sga_5th","sga_10th")
df.clinical.vars[,names(df.clinical.vars) %in% variables.numeric] <- sapply(lapply(df.clinical.vars[,names(df.clinical.vars) %in% variables.numeric],as.character),as.numeric)
# Make some variables factors
df.clinical.vars <- df.clinical.vars %>%
mutate(race = factor(race, levels=c(1,2,3), labels=c("white","black","asian"))) %>%
mutate(fetal_sex = factor(fetal_sex, levels=c(0,1), labels=c("male","female"))) %>%
mutate(sga_5th = factor(sga_5th, levels=c(0,1), labels=c("no","yes"))) %>%
mutate(sga_10th = factor(sga_10th, levels=c(0,1), labels=c("no","yes")))
# Read VOCAL measurements (AIUM data)
df.vocal=read_xlsx("/Users/alison/Desktop/BMIN_503/final_project/placenta_vocal_measures.xlsx")
df.vocal <- df.vocal %>%
mutate(study_id=as.numeric(`Study ID`)) %>%
rename(Vvocal=VolumeA,Tvocal=ThicknessA,CRLvocal=`CRL (mm)`) %>%
select(study_id,Vvocal,Tvocal,CRLvocal)
df.clinical.all <- inner_join(df.clinical.vars,df.vocal,by="study_id") %>%
filter(model_id %in% seq(1,60,1))
# Read 3DUS measurements
df.measures.3d=read.csv("/Users/alison/Desktop/BMIN_503/final_project/sga_study_allsubjects.csv") %>%
rename(model_id=Model) %>%
mutate(Vsnap=Vmanual/1000) %>%
mutate(Tcmrep_mean=thickness_mean/10) %>%
mutate(Tcmrep_max=thickness_max/10)
df.merge = inner_join(df.clinical.all,df.measures.3d,by="model_id")Here we perform the exploratory data analysis.
library(GGally)##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
library(ggplot2)
library(cowplot)##
##
## *******************************************************
## Note: cowplot does not change the default ggplot2 theme
## anymore. To recover the previous behavior, execute:
## theme_set(theme_cowplot())
## *******************************************************
library(ggthemes)##
## Attaching package: 'ggthemes'
## The following object is masked from 'package:cowplot':
##
## theme_map
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# Maternal characteristics relative to sga_10th
g1 <- ggplot(data=df.merge, aes(x=race, fill=sga_10th)) +
geom_bar(position="stack")
g2 <- ggplot(data=df.merge, aes(x=sga_10th,y=sbpscrn)) +
geom_boxplot()
g3 <- ggplot(data=df.merge, aes(x=sga_10th,y=height)) +
geom_boxplot()
g4 <- ggplot(data=df.merge, aes(x=sga_10th,y=wtscrn)) +
geom_boxplot()
plot_grid(g1, g2, g3, g4, ncol = 2, labels="AUTO")# fetal characteristics relative to sga_10th
f1 <- ggplot(data=df.merge, aes(x=race, fill=sga_10th)) +
geom_bar(position="stack")
f2 <- ggplot(data=df.merge, aes(x=fetal_sex.x, fill=sga_10th)) +
geom_bar(position="stack")
f3 <- ggplot(data=df.merge, aes(x=sga_10th,y=maternal_age_US1)) +
geom_boxplot()
f4 <- ggplot(data=df.merge, aes(x=sga_10th,y=height)) +
geom_boxplot()
f5 <- ggplot(data=df.merge, aes(x=sga_10th,y=wtscrn)) +
geom_boxplot()
plot_grid(f1, f2, f3, f4, f5, ncol = 2, labels="AUTO")#g6 <- ggplot(data=df.merge, aes(x=sga_10th,y=gadel)) +
# geom_boxplot()
# Volume analysis
g <- ggplot(data=df.merge, aes(x=Vvocal,y=Vsnap)) +
geom_smooth(method = "lm", color="black",size=0.1) +
geom_point(aes(shape=sga_10th,text=paste("Model ID:",model_id)),color="black",size=2) +
scale_shape_manual(values=c(1,3))
#geom_rangeframe() +
#theme_tufte()
ggplotly(g)glm.vol = glm(Vvocal~Vsnap,data=df.merge)
summary(glm.vol)##
## Call:
## glm(formula = Vvocal ~ Vsnap, data = df.merge)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -50.834 -10.500 -0.016 6.110 93.790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.9789 7.5478 1.720 0.0908 .
## Vsnap 0.6628 0.0907 7.308 8.91e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 431.7668)
##
## Null deviance: 48101 on 59 degrees of freedom
## Residual deviance: 25042 on 58 degrees of freedom
## AIC: 538.31
##
## Number of Fisher Scoring iterations: 2
# Thickness analysis
tmean <- ggplot(data=df.merge, aes(x=Tvocal,y=Tcmrep_mean)) +
geom_smooth(method = "lm", color="black",size=0.1) +
geom_point(aes(shape=sga_10th,text=paste("Model ID:",model_id)),color="black",size=2) +
scale_shape_manual(values=c(1,3))
ggplotly(tmean)glm.thickness_mean = glm(Tvocal~Tcmrep_mean,data=df.merge)
summary(glm.thickness_mean)##
## Call:
## glm(formula = Tvocal ~ Tcmrep_mean, data = df.merge)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.86258 -0.21914 -0.07454 0.20751 0.83857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2024 0.1645 7.308 8.9e-10 ***
## Tcmrep_mean 0.4136 0.1207 3.427 0.00113 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1193708)
##
## Null deviance: 8.3253 on 59 degrees of freedom
## Residual deviance: 6.9235 on 58 degrees of freedom
## AIC: 46.707
##
## Number of Fisher Scoring iterations: 2
tmax <- ggplot(data=df.merge, aes(x=Tvocal,y=Tcmrep_max)) +
geom_smooth(method = "lm", color="black",size=0.1) +
geom_point(aes(shape=sga_10th,text=paste("Model ID:",model_id)),color="black",size=2) +
scale_shape_manual(values=c(1,3))
ggplotly(tmax)glm.thickness_max = glm(Tvocal~Tcmrep_max,data=df.merge)
summary(glm.thickness_max)##
## Call:
## glm(formula = Tvocal ~ Tcmrep_max, data = df.merge)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.83248 -0.20510 -0.05782 0.15915 0.83063
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.22858 0.15902 7.726 1.76e-10 ***
## Tcmrep_max 0.23178 0.06847 3.385 0.00128 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.119859)
##
## Null deviance: 8.3253 on 59 degrees of freedom
## Residual deviance: 6.9518 on 58 degrees of freedom
## AIC: 46.952
##
## Number of Fisher Scoring iterations: 2
Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.